1,044 research outputs found

    DnaSP v5: A software for comprehensive analysis of DNA polymorphism data

    Get PDF
    Podeu consultar el programari a: http://hdl.handle.net/2445/53451DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser

    Disperse—a software system for design of selector probes for exon resequencing applications

    Get PDF
    Summary:Selector probes enable the amplification of many selected regions of the genome in multiplex. Disperse is a software pipeline that automates the procedure of designing selector probes for exon resequencing applications

    An Overview of the Use of Neural Networks for Data Mining Tasks

    Get PDF
    In the recent years the area of data mining has experienced a considerable demand for technologies that extract knowledge from large and complex data sources. There is a substantial commercial interest as well as research investigations in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from datasets. Artificial Neural Networks (NN) are popular biologically inspired intelligent methodologies, whose classification, prediction and pattern recognition capabilities have been utilised successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks

    Beyond element-wise interactions: identifying complex interactions in biological processes

    Get PDF
    Background: Biological processes typically involve the interactions of a number of elements (genes, cells) acting on each others. Such processes are often modelled as networks whose nodes are the elements in question and edges pairwise relations between them (transcription, inhibition). But more often than not, elements actually work cooperatively or competitively to achieve a task. Or an element can act on the interaction between two others, as in the case of an enzyme controlling a reaction rate. We call “complex” these types of interaction and propose ways to identify them from time-series observations. Methodology: We use Granger Causality, a measure of the interaction between two signals, to characterize the influence of an enzyme on a reaction rate. We extend its traditional formulation to the case of multi-dimensional signals in order to capture group interactions, and not only element interactions. Our method is extensively tested on simulated data and applied to three biological datasets: microarray data of the Saccharomyces cerevisiae yeast, local field potential recordings of two brain areas and a metabolic reaction. Conclusions: Our results demonstrate that complex Granger causality can reveal new types of relation between signals and is particularly suited to biological data. Our approach raises some fundamental issues of the systems biology approach since finding all complex causalities (interactions) is an NP hard problem

    MagicViewer: integrated solution for next-generation sequencing data visualization and genetic variation detection and annotation

    Get PDF
    New sequencing technologies, such as Roche 454, ABI SOLiD and Illumina, have been increasingly developed at an astounding pace with the advantages of high throughput, reduced time and cost. To satisfy the impending need for deciphering the large-scale data generated from next-generation sequencing, an integrated software MagicViewer is developed to easily visualize short read mapping, identify and annotate genetic variation based on the reference genome. MagicViewer provides a user-friendly environment in which large-scale short reads can be displayed in a zoomable interface under user-defined color scheme through an operating system-independent manner. Meanwhile, it also holds a versatile computational pipeline for genetic variation detection, filtration, annotation and visualization, providing details of search option, functional classification, subset selection, sequence association and primer design. In conclusion, MagicViewer is a sophisticated assembly visualization and genetic variation annotation tool for next-generation sequencing data, which can be widely used in a variety of sequencing-based researches, including genome re-sequencing and transcriptome studies. MagicViewer is freely available at http://bioinformatics.zj.cn/magicviewer/

    A Revised Design for Microarray Experiments to Account for Experimental Noise and Uncertainty of Probe Response

    Get PDF
    Background Although microarrays are analysis tools in biomedical research, they are known to yield noisy output that usually requires experimental confirmation. To tackle this problem, many studies have developed rules for optimizing probe design and devised complex statistical tools to analyze the output. However, less emphasis has been placed on systematically identifying the noise component as part of the experimental procedure. One source of noise is the variance in probe binding, which can be assessed by replicating array probes. The second source is poor probe performance, which can be assessed by calibrating the array based on a dilution series of target molecules. Using model experiments for copy number variation and gene expression measurements, we investigate here a revised design for microarray experiments that addresses both of these sources of variance. Results Two custom arrays were used to evaluate the revised design: one based on 25 mer probes from an Affymetrix design and the other based on 60 mer probes from an Agilent design. To assess experimental variance in probe binding, all probes were replicated ten times. To assess probe performance, the probes were calibrated using a dilution series of target molecules and the signal response was fitted to an adsorption model. We found that significant variance of the signal could be controlled by averaging across probes and removing probes that are nonresponsive or poorly responsive in the calibration experiment. Taking this into account, one can obtain a more reliable signal with the added option of obtaining absolute rather than relative measurements. Conclusion The assessment of technical variance within the experiments, combined with the calibration of probes allows to remove poorly responding probes and yields more reliable signals for the remaining ones. Once an array is properly calibrated, absolute quantification of signals becomes straight forward, alleviating the need for normalization and reference hybridizations

    Rapid flow-sorting to simultaneously resolve multiplex massively parallel sequencing products

    Get PDF
    Sample preparation for Roche/454, ABI/SOLiD and Life Technologies/Ion Torrent sequencing are based on amplification of library fragments on the surface of beads prior to sequencing. Commonly, libraries are barcoded and pooled, to maximise the sequence output of each sequence run. Here, we describe a novel approach for normalization of multiplex next generation sequencing libraries after emulsion PCR. Briefly, amplified libraries carrying unique barcodes are prepared by fluorescent tagging of complementary sequences and then resolved by high-speed flow cytometric sorting of labeled emulsion PCR beads. The protocol is simple and provides an even sequence distribution of multiplex libraries when sequencing the flow-sorted beads. Moreover, since many empty and mixed emulsion PCR beads are removed, the approach gives rise to a substantial increase in sequence quality and mean read length, as compared to that obtained by standard enrichment protocols

    Visualizing dimensionality reduction of systems biology data

    Full text link
    One of the challenges in analyzing high-dimensional expression data is the detection of important biological signals. A common approach is to apply a dimension reduction method, such as principal component analysis. Typically, after application of such a method the data is projected and visualized in the new coordinate system, using scatter plots or profile plots. These methods provide good results if the data have certain properties which become visible in the new coordinate system and which were hard to detect in the original coordinate system. Often however, the application of only one method does not suffice to capture all important signals. Therefore several methods addressing different aspects of the data need to be applied. We have developed a framework for linear and non-linear dimension reduction methods within our visual analytics pipeline SpRay. This includes measures that assist the interpretation of the factorization result. Different visualizations of these measures can be combined with functional annotations that support the interpretation of the results. We show an application to high-resolution time series microarray data in the antibiotic-producing organism Streptomyces coelicolor as well as to microarray data measuring expression of cells with normal karyotype and cells with trisomies of human chromosomes 13 and 21

    Comparing Platforms for C. elegans Mutant Identification Using High-Throughput Whole-Genome Sequencing

    Get PDF
    Whole-genome sequencing represents a promising approach to pinpoint chemically induced mutations in genetic model organisms, thereby short-cutting time-consuming genetic mapping efforts.We compare here the ability of two leading high-throughput platforms for paired-end deep sequencing, SOLiD (ABI) and Genome Analyzer (Illumina; "Solexa"), to achieve the goal of mutant detection. As a test case we used a mutant C. elegans strain that harbors a mutation in the lsy-12 locus which we compare to the reference wild-type genome sequence. We analyzed the accuracy, sensitivity, and depth-coverage characteristics of the two platforms. Both platforms were able to identify the mutation that causes the phenotype of the mutant C. elegans strain, lsy-12. Based on a 4 MB genomic region in which individual variants were validated by Sanger sequencing, we observe tradeoffs between rates of false positives and false negatives when using both platforms under similar coverage and mapping criteria.In conclusion, whole-genome sequencing conducted by either platform is a viable approach for the identification of single-nucleotide variations in the C. elegans genome
    corecore